SheetReader: Efficient Specialized Spreadsheet Parsing

نویسندگان

چکیده

Spreadsheets are widely used for data exploration. Since spreadsheet systems have limited capabilities, users often need to load spreadsheets other science environments perform advanced analytics. However, current approaches loading suffer from either high runtime or memory usage, which hinders exploration on commodity systems. To make practical systems, we introduce a novel parser that minimizes usage by tightly coupling decompression and parsing. Furthermore, reduce the runtime, optimized spreadsheet-specific parsing routines employ parallelism. evaluate our approach, implement prototypes Excel into R Python environments. Our evaluation shows approach is up 3× faster while consuming 40× less than state-of-the-art approaches. The source code available at https://github.com/fhenz/SheetReader-r.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shallow Parsing using Specialized HMMs

We present a unified technique to solve different shallow parsing tasks as a tagging problem using a Hidden Markov Model-based approach (HMM). This technique consists of the incorporation of the relevant information for each task into the models. To do this, the training corpus is transformed to take into account this information. In this way, no change is necessary for either the training or t...

متن کامل

Parsing Speech Repair without Specialized Grammar Symbols

This paper describes a parsing model for speech with repairs that makes a clear separation between linguistically meaningful symbols in the grammar and operations specific to speech repair in the operation of the parser. This system builds a model of how unfinished constituents in speech repairs are likely to finish, and finishes them probabilistically with placeholder structure. These modified...

متن کامل

Efficient Transformation-Based Parsing

In transformation-based parsing, a finite sequence of tree rewriting rules are checked for application to an input structure. Since in practice only a small percentage of rules are applied to any particular structure, the naive parsing algorithm is rather inefficient. We exploit this sparseness in rule applications to derive an algorithm two to three orders of magnitude faster than the standard...

متن کامل

Efficient Bottom-Up Parsing

This paper describes a series of experiments aimed at producing a bot tom-up parser that will produce partial parses suitable for use in robust interpretation and still be reasonably efficient. In the course of these experiments, we improved parse times by a factor of 18 over our first a t tempt, ending with a system that was twice as fast as our previous parser, which relied on strong top-down...

متن کامل

Learning Efficient Parsing

A corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. An interesting characteristic...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Systems

سال: 2023

ISSN: ['0306-4379', '1873-6076']

DOI: https://doi.org/10.1016/j.is.2023.102183